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Abstract —We extend the theory of matrix completion to the 
case where we make Poisson observations for a subset of entries of 
a low-rank matrix. We consider the (now) usual matrix recovery 
formulation through maximum likelihood with proper constraints 
on the matrix M, and establish theoretical upper and lower 
bounds on the recovery error. Our bounds are nearly optimal 
up to a factor on the order of 0{\og{did2))* These bounds are 
obtained by adapting the arguments used for one-bit matrix 
completion Q (although these two problems are different in 
nature) and the adaptation requires new techniques exploiting 
properties of the Poisson likelihood function and tackling the 
difficulties posed by the locally sub-Gaussian characteristic of 
the Poisson distribution. Our results highlight a few important 
distinctions of Poisson matrix completion compared to the prior 
work in matrix completion including having to impose a minimum 
signal-to-noise requirement on each observed entry. We also 
develop an efficient iterative algorithm and demonstrate its good 
performance in recovering solar fiare images. 

Keywords—matrix completion, Poisson noise, high-dimensional 
statistics, information theory 

1. Introduction 

Matrix completion, with a goal of recovering a low-rank 
matrix M G fj-om observations of a subset of its entries, 

attracts much interests recently due to its important real world 
applications including the famous Netflix problem |[^. Poisson 
matrix completion, where the observations are Poisson counts 
of a subset of the entries, is an important instance in its own 
as it occurs from a myriads of applications including optical 
imaging, nuclear medicine, low-dose x-ray imaging and 
network traffic analysis 0. 

Recently, much success has been achieved in solving the 
matrix completion problem using nuclear norm minimization, 
partly inspired by the theory of compressed sensing 0. It has 
been shown that when M is low rank, it can be recovered 
from only a few observations on its entries (see, e.g. O-G!))- 
Earlier work on matrix completion typically assume that the 
observations are noiseless, i.e., we may directly observe a subset 
of entries of M. In the real world, however, the observations 
are noisy, which is the focus of the subsequent work GD- 
1^ , most of which consider a scenario where M is the sum 
of a low-rank matrix with a Gaussian random matrix, i.e., 
the observations are a subset of entries of M contaminated 
with Gaussian noise. Recently there has also been work which 
consider the more general noise models, including noisy 1-bit 
observations |[T|, which may be viewed as a case where the 
observations are Bernoulli random variables whose parameters 
depend on a underlying low-rank matrix. The other method 
is developed for Poisson matrix completion but it does not 
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establish a lower bound. Another related work pSj (not in the 
matrix completion setting) considers the case where all entries 
of the low-rank matrix are observed and the observations are 
Poisson counts of the entries of the underlying matrix. In the 
compressed sensing literature, there is a line of research for 
sparse signal recovery in the presence of Poisson noise |[23|- 
flSl and the corresponding performance bounds. The recently 
developed SCOPT p6| , algorithm can also be used to 
solve the Poisson compressed sensing problems. 

In this paper, we extend the theory of matrix completion 
to the case of Poisson observations. We study recovery based 
on maximum likelihood with proper constraints on a matrix 
M with rank less than or equal to r (nuclear norm bound 
||M||h< < ay/rdid 2 for some constant a and bounded entries 
P < Mij < a). Note that the formulation differs from the 
one-bit matrix completion case in that we also require a lower 
bound on each entry of the matrix. This is consistent with 
an intuition that the value of each entry can be viewed as the 
signal-to-noise ratio (SNR) for a Poisson observation, and hence 
this essentially poses a requirement for the minimum SNR. We 
also establish upper and lower bounds on the recovery error, 
by adapting the arguments used for one-bit matrix completion 
(ij. The upper and lower bounds nearly match up to a factor 
on the order of O{log{did 2 )), which shows that the convex 
relaxation formulation for Poisson matrix completion is nearly 
optimal. (We conjecture that such a gap is inherent to the 
Poisson problem). Moreover, we also highlight a few important 
distinctions of Poisson matrix completion compared to the 
prior work on matrix completion in the absence of noise and 
with Gaussian noise: (1) Although our arguments are adapted 
from one-bit matrix completion (where the upper and lower 
bounds nearly match), in the Poisson case there will be a 
gap between the upper and lower bounds, possibly due to the 
fact that Poisson distribution is only locally sub-Gaussian. In 
our proof, we notice that the arguments based on bounding 
all moments of the observations, which usually generate tight 
bounds for prior results with sub-Gaussian observations, do not 
generate tight bounds here; (2) We will need a lower bound 
on each matrix entry in the maximum likelihood formulation, 
which can be viewed as a requirement for the lowest signal- 
to-noise ratio (since the signal-to-noise ratio (SNR) of a 
Poisson observation with intensity / is Vl). Compared with the 
more general framework for M-estimator our results are 
specific to the Poisson case, which may possible be stronger 
but do not apply generally. We also develop several simple 
yet efficient algorithms, including proximal and accelerated 
proximal gradient descent algorithms, and an algorithm which 
is based on singular value thresholding that we examine in 



details. This algorithm can be viewed as a consequence of 
approximating the log likelihood function by its second order 
Taylor expansion and invoking a theorem for exact solution 
of a nuclear norm regularized problem Our algorithm 
is related to |[29|- (D and can be viewed as a special case 
where a simple closed form solution for the algorithm exists. 
We further demonstrate the good performance of the algorithm 
in recovering solar flare images. 

Our formulation and results are inspired by the seminal work 
of one-bit matrix completion |[^, yet with several important 
distinctions. In one-bit matrix completion, the value of each 
observation Yij is binary-valued and hence bounded, whereas 
in our problem, each observation is a Poisson random variable 
which is unbounded; hence, the arguments involve bounding 
measurements have to be changed. In particular, we need to 
bound maxij Yij when Yij is a Poisson random variable with 
intensity Mij . Moreover, the Poisson likelihood function is non 
Lipschitz (due to a bad point when Mij tends to zero), and 
hence we need to introduce a lower bound on each entry of the 
matrix Mij, which can be interpreted as the lowest required 
SNR. Other distinctions also include analysis taking into account 
of the property of the Poisson likelihood function, and using 
Kullback-Leibler (KL) divergence as well as Hellinger distance 
that are different from those for the Bernoulli random variable 
as used in |[T|. 

While working on this paper we realize a parallel work [3^ 
which also studies performance bounds for low rank matrix 
completion with exponential family noise under more general 
assumptions and using a different approach for proof (Poisson 
noise is a special case of theirs). Their upper bound for the MSB 
per entry is on the order of O (log(di + d 2 )r maxjdi, d 2 }/m) 

(our upper bound is O ^log(did 2 )y^r^d 7 ^rd 2 ) 7 ^j), and their 
lower bound is on the order of O (r max{di, (i 2 }/m) (versus 
our lower bound is O ^T{d\ -f d 2 )/m^). 

The rest of the paper is organized as follows. Section |I^ sets 
up the formalism for Poisson matrix completion. Section |nl| 
presents the matrix recovery based on constrained maximum 
likelihood and establishes the upper and lower bounds for the 
recovery accuracy. Section |lv| presents an efficient iterative 
algorithm that solves the maximum likelihood approximately 
and demonstrates its performance on recovering solar flare 
images. All proofs are delegated to Appendix. 

The notation in this paper is standard. In particular, M+ 
denotes the set of positive real numbers; [d] = {1, 2,..., d}; 
I[£] is the indicator function for an event 5 ; |A| denotes the 
number of elements in a set A; diag{Ai} denotes a diagonal 
matrix with a set of numbers {Ai} on its diagonal; Inxm denotes 
an n-by-m matrix of all ones. Let entries of a matrix M be 
denoted by Mij. Let ||M|| be the spectral norm which is the 

largest absolute singular value, ||M||i 7 = be the 

Frobenius norm, ||M||* be the nuclear norm which is the sum 
of the singular values, and finally ||M||oo = max^j \Mij\ be 
the infinity norm. Let rank(M) denote the rank of a matrix M. 
We say that a random variable X follows Poisson distribution 
with parameter A (or X ^ Poisson(A) if its probability mass 
function P(X = k) = e~^\^/{k\)). We also define the KL 
divergence and Hellinger distance for Poisson distribution as 
follows: the KL divergence of two Poisson distributions with 


parameters p and q, where p^q ^ M+ is given by D{p\\q) = 
p log{p/q) — {p—q); the Hellinger distance between two Poisson 
distributions with parameters p and q with p^q ^ is given 

by d‘jj{p, q) = 2 —2 exp | , We further define 

the average KL divergence and Hellinger distance for entries 
of two matrices P, Q e ^ , where each entry corresponds 

to the parameter of a Poisson random variable: 

id 

ij 

11. Formulation 

Suppose we observe a subset of entries of a matrix M G 
index set ft C [di] x [^ 2 ]. The indices are 
randomly selected with E|r^| = m. In other words, I{(ij)eQ} 
are i.i.d. Bernoulli random variables with parameter ml(d\d^. 
The observations are Poisson counts of the observed matrix 
entries 

Yij ^ Poisson(Mij), V(i,i) G (1) 

Our goal is to recover the matrix M from the Poisson 
observations {Yij}(ij)(za- 

We make the following assumptions. First, we set an upper 
bound (T > 0 for the entries of M to entail the recovery 
problem is well-posed | [T8| . This assumption is also reasonable 
in practice; for instance, M may represent an image which 
is usually not too spiky. Second, assume the rank of M is 
less than or equal to a positive integer r < minjdi, 6 / 2 } (this 
assumption is not restrictive in that we only assume an upper 
bound on the rank). The third assumption is characteristic to 
Poisson matrix completion: we set a lower bound > 0 for 
each entry My. This entry-wise lower bound is required for our 
later analysis, and it also has an interpretation of a minimum 
required signal-to-noise ratio (SNR), as the SNR of a Poisson 
observation with intensity I is \/7. 

We recover the matrix M using a regularized maximum 
likelihood formulation. Note that the log-likelihood function 
for the Poisson observation model Q is proportional to 

Fn,Y{X)= YijlogXij-Xij, (2) 

(Li)eo 

where the subscript ft and Y indicate the random quantities 
involved in the maximum likelihood function F. Based on our 
assumptions, we may define a set of candidate estimators 

5 4 {x G : ll^ll* < Wrdid2, 

/? < Xij < a,\/{i,j) G [di] X [d2]}. 

Here the upper bound on the nuclear norm ||M||>^ comes from 
combining the assumptions ||M||oo < and rank(M) < r, 
since ||M||* < ^/Tsir±{M)\\M\\F and \\M\\f^ ^/d^\\M\\oo 
lead to ||M||* < ay/rdid 2 . An estimator M for M can be 
obtained by solving the following convex optimization problem: 

M = arg max Fq^y ) • (4) 
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III. Performance Bounds 


In the following, we establish an upper bound and an 
information theoretic lower bound on the mean square error 
(MSB) per entry \\M — M\\‘^p/{did 2 ) for the estimator in (|^. 

Theorem 1 (Upper bound). Assume M e S, rank{M) = r, 
U is chosen at random following our sampling model with 
E|U| = m, and M is the solution to Then with a probability 
exceeding {1 — C/{did 2 )), we have 


any algorithm which, for any M G S, returns an estimator M. 
Then there exists M e S such that with probability at least 
3/4, 


- Mill > min 

iid2 V m 


(7) 

as long as the right-hand side of Q exceeds ro? j mm{di,d 2 }, 
where Co, Ci, C 2 are absolute constants. 


did: 


-\\M -M\\% < C 


8aT 


1 — e 


-T 




{a{e^ - 2) + 3 log(<i.<i,)) 

(5) 


If m> {di + (^ 2 ) log(di(i 2 ) then simplifies to 


8aT \ / a^/r 

[~T 

{a{e^ - 2) + 3 log(did 2 )) 

Above, T^C'^C are absolute constants, where T^C^C' are 
absolute constants. 


\\M-Mfp < V2C' 



The proof of Theorem is an extension of the ingenious 
arguments for one-bit matrix completion Q. The extension 
for Poisson case here is nontrivial for various aforementioned 
reasons (notably the non sub-Gaussian and only locally sub- 
Gaussian nature of the Poisson observations). An outline of 
our proof is as follows. First, we establish an upper bound for 
the KL divergence D{M\\X) for any M, X G 5 by applying 
Lemma given in the appendix. Seco^, we find an upper 
bound for the Hellinger distance d|^(M, M) using the fact that 
the KL divergence can be bounded from below by the Hellinger 
distance. Finally, we bound the mean squared error in Lemma 
via the Hellinger distance. 

Remark 1. Fixing di,d 2 ,m, a and P, the upper bound in 
Theorem |7] increases as r increases. This is consistent with the 
intuition that our method is better at dealing with approximately 
low-rank matrices (than with nearly full rank matrices). On the 
other hand, fixing di,d 2 ,a, (3 and r, the upper bound decreases 
as m increases, which is also consistent with our intuition that 
M is supposed to be recovered more accurately with more 
observations. 

Remark 2. In the upper bound 0. the mean-square-error per 
entry can be arbitrarily small, in the sense that the upper bound 
goes to zero as di and d 2 go to infinity when the number of 
the measurements m = 0{{di + ^ 2 ) \og^{did 2 )) (m < did 2 ) 
for 6 > 2 when r is fixed, or for 6 > 3 when r is sublinear on 
the order of o{\og{did 2 )). 

The following theorem establishes an information theoretic 
lower bound and demonstrates that there exists an M G 5 such 
that any recovery method cannot achieve a mean square error 
per entry less than the order of 0 (r max{di, d 2 }/m). 


Similar to l^, p^ , proof of Theorem [^relies on information 
theoretic arguments outlined as follows. First we find a set of 
matrices x G 5 so that the distance between any 
identified as is sufficiently large. Then, for 

any X G 5 and the recovered X, if we assume that they are 
sufficiently close to each other with high probability, then we 
can claim that X is the element in the set S that is closest to X. 
Finally, by applying a generalized Fano’s inequality involving 
KL divergence, we claim that the probability for the event that 
X is the matrix in set S closest to X must be small, which 
leads to a contraction and hence proves our lower bound. 

Remark 3. The assumptions in Theorem^can be achieved, for 
example, by the following construction. First, choose an a such 
that a > max{l, 2/3}, and then an r > A. Then, for di (or d 2 ) 
sufficiently large, the conditions that maxjdi, ^ 2 } > Co 
and the right-hand side of ^ exceeds ro?/ mm{di,d 2 } are 
met. Since r < 0{mm{di,d2}/<x‘^), M ^ S, what has been 
chosen is approximately low-rank. In other words, no matter 
how large r is, we can always find di (or d 2 ) large enough so 
that the assumptions in Theorem are satisfied and thus there 
exist an M which can not be recovered with arbitrarily small 
error by any method. 

Remark 4. When m > (di A ^ 2 ) log(did 2 ) and m = 
0{r{di + d 2 ) \og^{did 2 )) with 6 > 2, the ratio between the 
upper bound in 0 and the lower bound in 0 is on the order 
of 0(\og(did2)). Hence, the lower bound matches the upper 
bound up to a logarithmic factor. 

IV. Algorithms 

The matrix completion problem formulated in Q is a 
Semidefinite program (SDP), since it is a nuclear norm 
minimization problem with a convex feasible domain. Hence, 
we may solved it, for example, via the interior-point method | [34| . 
Although the interior-point method returns an exact solution to 
0 , it does not scale well with the dimensions of the matrix di 
and d 2 . 

In the following, we will develop a set of iterative algorithms 
that solves the problem approximately and are more efficient 
than solving the problem as SDP. In doing so, we use the 
framework of proximal algorithms to solve 0 . At first, we 
rewrite search space S as the intersection of two closed and 
convex set in 

Ti ^ {M e : ||M||* < rAFh} and 

r 2 ^ {M G : /? < Mij < ay{i,j) G [di] x [da]}, 


Theorem 2 (Lower bound). Fix a, r, di, and d 2 to be such 
that (T, di, (i 2 > 1, r > 4, Of > 2fi, and max{di, ^ 2 } > Cq. 
Let Ft be any subset of [df x [^ 2 ] with cardinality m. Consider 


where the first set is a nuclear norm ball and the second set 
is a high-dimensional box. Let f{M) = —FQ^y(M) be the 
negative log-likelihood function, then optimization problem 0 
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is equivalent to 


M = arg min f(M). (8) 

Merifirs 

Noticing that the search space S = Fi P r 2 is closed and 
convex and f{M) is a convex function, we can use proximal 
gradient methods to solve Let /r(M) be an indicator 
function that takes value zero if M G F and is oc if M G F^. 
Then problem ^ is also equivalent to 

M = arg min /(M) + (9) 

To guarantee the convergence of proximal gradient method, 
we need the Lipschitz constant L > 0. In our case, Lipschitz 
constant L is a positive number satisfying 

||V/(X) - \/f{Y)\\F <L\\x- y||^,G 5, (lo) 

and hence L = a/ by the definition of our problem. Define 
the projection of Y onto F as 

nr(r) =argmin||X-y||^. 


Algorithm 1 Proximal Gradient for Poisson Matrix Completion 

1: Initialize: [Mo]ij = Yij for (i, j) G and [Mo]ij = (a + 
P)/2 otherwise; the maximum number of iterations K. 

2: for /c = 1, 2,... iF do 

3: Mk = - (l/L)V/(Mfe_i)) 

4: end for 


Algorithmic has linear convergence rate, which is established 
in the following theorem: 


Theorem 3. Let be the sequence generated by Algorithm 

|7] Then for any k > 1, we have 


f{Mk) - f{M) < 


L\\Mo-M\\% 

2k 


Although Algorithm [C can be implemented easily, its linear 
convergence rate is not sufficiently if the Lipschitz constant L 
is large. In such scenarios, we prefer Nesterov’s accelerated 
method for solving this problem which is our Algorithm 
Algorithm has faster convergence, as stated in the following 

Algorithm 2 Accelerated Proximal Gradient for Poisson Matrix 
Completion 

1: Initialize: [Mo]ij = Yij for {i,j) G Ll and [Mo]ij = 
(a + 13)/2 otherwise; Zq = Mq; the maximum number 
of iterations K. 

2: for /c = 1, 2,... AT do 

3: Mk = - (l/L)Yf{Zk-i)) 

4: Zk = Mk + {{k — 1)/ {k + 2)) {Mk — M/c_i) 

5: end for 


Theorem (4] 


Theorem 4. Let {M]f\ be the sequence generated by Algorithm 
1C Then for any k > 1, we have 


f{Mk) - f{M) < 


2L||Mo-M||| 
(A:+ 1)2 


The remaining of the problem is then to deal with the 
projection onto the search space S. Since S is an intersection 
of two convex sets, we may use alternating projection algorithm 
to compute a sequence that converges to this intersection of 
Fi and F 2 , which is stated in Algorithmic Algorithm |C is 


Algorithm 3 Alternating Projection Algorithm 

1: Initialize: Uq is the matrix needed to be projected onto S. 
2: for jf = 1, 2,... do 
3: Vj=UrAUj-l) 

4: Uj=IlrAVj) 

5: If \\Vj — Uj\\F < 10 ^ then return Uj. 

6: end for 


efficient if some closed forms of projection onto the convex 
sets can be achieved. Fortunately, computation of the projection 
onto F 2 in our case is quite simple. Based on the definition 
of Frobenius norm, [Ilr20^)]ij is: ^ if Yij < /3; a if Yij > a; 
Yij if P < Yij < a. Even if there is no closed form expression 
for projection onto Fi, we can use TFOCS, a matlab package, 
to implement this step. 

Similar to the construction in pQ| , we may rewrite 0 as 
M = arg min f{M) + A||M||*, (11) 

MEi '2 

where A is a regularizing parameter that balances the goodness 
of data fit versus regularization. 


The PMLSV algorithm can be derived as follows (in the 
same spirit as [ [^ |, |19|). Let f{M) = —Fq^y{M) be the 
negative log-likelihood function. In the kth iteration, we may 
form a Taylor expansion of f{M) around keep up to 

second term and then solve 


Mfe = arg min [Qt^{M,Mk-i) + A||M||*], (12) 

MEV 2 

with 

Qt,{M,Mk-i) ^ /(Mfe_i) + (M - V/(Mfe_i)) 

+ |||M-Mfe_i|||p, (13) 

where V/ is the gradient of /, tk is the reciprocal of the 
step size in the kth iteration, which we will specify later. By 
dropping and introducing terms independent of M whenever 
needed (more details can be found in lig), ([Tg is equivalent 
to 


Mk = 



1 

/ 1 , \ 

^ A ' 

arg min 
M 

2 


+ 

1 

* 

1_ 


(14) 


Using a theorem proved in fT^ , we may show (in Appendix [A|) 
that the exact solution to (14) is given by a form of Singular 
Value Thresholding (SVT): 


M.=D 


^/tk 


Mfc_i - -V/(Mfe_i) 


(15) 


where Dr{Y) = diagKcr^ — r)+} and (x)+ = max{x,0}. 
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Algorithm 4 PMLSV for Poisson Matrix Completion 

1: Initialize: = Yij for {i^j) G Q and is (a + /3)/2 

otherwise, the maximum number of iterations K, parameters 
r), and L. 

2: for k = 1,2,... K do 

3: C = Mk-I - (l/L)V/(Mfe_i) 

4: C = UDV^ {singular value decomposition} 

5: D„ew = diag((diag(L>) - A/L)+) 

6: Mk = Hr, (UDne^V^) 

1: If f{Mk) > QL{Mk, Mk-i) then L = rjL, go to 4. 

8: If \f{Mk) - QL{Mk,Mk-i)\ < 0.5/K then exit; 


9: end for 



Fig. 1: Solar flare image of size 48-by-48 with rank 10. 


The PMLSV algorithm is summarized in Algorithm In 
the algorithm description, L is the reciprocal of the step size, 
7 ^ > 1 is a scale parameter to change the step size, and K is 
the maximum number of iterations, which is user specified: 
a larger K leads to more accurate solution, and a small K 
obtains the coarse solution quickly. If the cost function value 
does not decrease, the step size is shortened to change the 
singular values more conservatively. The algorithm terminates 
when the absolute difference in the cost function values between 
two consecutive iterations is less than 0.5/AT. 

Under the assumption that the box constraint is not binding. 
Algorithm]^ is the same as that in p9| and convergence analysis 
can be found there. 

Despite of its simplicity, the PMLSV algorithm has a 
surprisingly good performance. With the simple initialization 
for Mo, the magnitude of the gradient is typically small at each 
iteration. Hence, we can ensure Mk to be belong to or be close 
to r by choosing an appropriate step size in the kth iteration. 

The complexity of PMLSV is on the order of 0{d\d2 + d^) 
(which comes from the most expensive step of performing 
singular value decomposition). This is much lower than the 
complexity of solving an SDP, which is on the order of 
0{d\^did2^d\d2). In particular, for a d-hy-d matrix, PMLSV 
algorithm has complexity 0{d‘^) versus solving the SDP has 
complexity 0{d^). 

V. Numerical example 

We demonstrate the good performance of our estimator in 
recovering a solar flare image. The solar flare image is of size 
48-by-48. We break the image into 8 -by -8 patches, then collect 
the vectorized patches into a 64-by-36 matrix: such a matrix is 
well approximated by a low-rank matrix, as demonstrated in 

Fig. [3 

Suppose entries are observed using our sampling model 
with E|U| = m. Let p = m/idxd^), then we observe (100p)% 
of entries. We use L = 10“^ and 77 = 1.1 in the PMLSV 
algorithm. Fig. 2 to Fig. 4 show the recovery result when 80%, 


50% and 30% of the image are observed. The results show that 
our algorithm can recover the original image accurately when 
50% or above of the image entries are observed. In the case of 
only 30% of the image entries are observed, our algorithm still 
captures the main features in the image. The PMLSV algorithm 
is very efficient: the running time on a laptop with 2.40Hz two 
core CPU and 8 GB RAM for all three examples are less than 
1.2 seconds (much faster than solving SDP). 



(a) p = 0.8. (b) A = 0.1, K = 2000. 


Fig. 2: (a) Observed image with 80% of entries known (dark spots 
represent missing entries), (b) Recovered image with A = 0.1 and 
no more than 2000 iterations, where the elapsed time is 1.176595 
seconds. 



(a) p = 0.5. (b) A = 0.1, K = 2000. 


Fig. 3: (a) Observed image with 50% of entries known (dark spots 
represent missing entries), (b) Recovered image with A = 0.1 and 
no more than 2000 iterations, where the elapsed time is 1.110226 
seconds. 



(a) p = 0.3. (b) A = 0.1, K = 2000. 


Fig. 4: (a) Observed image with 30% of entries known (dark spots 
represent missing entries), (b) Recovered image with A = 0.1 and 
no more than 2000 iterations, where the elapsed time is 1.097281 
seconds. 
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Appendix A 

Singular value thresholding 

Consider the following problem 

min \h\Y-X\\% + T\\Y\\A, (16) 

where X G is given and r is the regularization 

parameter. For a matrix X G ^dink r, let its singular 

value decomposition be X = U'EV^, where U G 
V G S = diagdcr^}, i = 1, 2,r), and is a singular 

value of the matrix X. For each r >0, define the singular 
value thresholding operator as: 

Dr{X) ^ UDr{T.)V^. (17) 

The solution to ( [T^ is given by singular value thresholding 
according to the following theorem 

Theorem 5 (Theorem 2.1 in (T^). For each r >0, and 

X G M^ix^2. 

Z),(X) = arg min 11 ||y - + T||r||*|. (18) 
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Appendix B 
Proofs 

In the following, Lemma is used in proving Lemma 
and Lemma 1^ corresponds to Lemma 3 in 

Lemma 1. Assuming Y ^ Poisson (A) is a Poisson random 
variable with X < a. Then P(F — X > t) < Vt > to for 
to = a{e^ — 3). 


Proof: 


We introduce 0 > 0, 

F{Y -X>t)=F{Y >t AX) 

=P {OY >0{tAX))=F (exp (OY) > exp {0 (t + A))). 


Using Markov inequality, we can have 

F{Y-X>t)<exp {-d {t + A)) E (e^^). 
= exp(—d(X +1)) ■ exp (A(e^ — 1)) 

Letting 0 = 2, 


p(y-A>t) 
exp(—t) 


exp (—t + A(e^ 


Define that 

to = a{e‘^ — 3), 


3)). 


to make < 1, we derive that P (F — A > t) < e ^ 

when 


t >to > A(e^ — 3) 


Lemma 2. Let Fqy{X) be the likelihood function defined in 
(0 and S be the set defined in Q, then 


sup |Ff,,y(X)-EFo,y(X)| 

Vxes 

> C [ayfrjfi) (a(e^ - 2) + 3 log((ii(i2)) • 
[xjm{dx Ad2) A did2 log(did2)^ | 


- did2’ 


(19) 


where C and C are absolute positive constants and the 
probability and the expectation are both over ft and Y. 


Proof: In order to prove the lemma, we let e^j are i.i.d. 
Rademacher random variables. In the following derivation, the 
first inequality is due the Radamacher symmetrization argument 
(Lemma 6.3 in p6| ) and the second inequality is due to the 
power mean inequality: (a + h)^ < 2^~^{afi A b^) if a, 6 > 0 
and h > 1. Then we have 


E 


sup|Fn,y(X)-EFn,y(X)|'‘ 

_xes 




h~ 

sup 

^ij'^idd)^^] O^ij ^ij ^ij ) 


xes 

id 



= 2'‘E 


sup 




hj 




h~ 

sup 



xes 

id 






h~ 

_^22'»-1e 


^i3^[iid)^^]^i3 



id 



(20) 


where the expectation are over both Q and Y. 


In the following, we will use contraction principle to further 
bound the first term of (j^. We let 0(t) = —/3 log(f + 1). We 
know 0(0) = 0 and |0 (t)| = |/3/(t + 1)|, so |0'(t)| < 1 if 
t > 0 — 1 . Setting Z = X — l(i^xd 2 ^ we have Zij > 
0 - 1, V(i, j) G [di] X [^ 2 ] and ||Z||* < ax/rdfdhz A by 

triangle inequality. Therefore, 0(Z^0 is a contraction and it 
vanishes at 0. By Theorem 4.12 in and using the fact that 
\{A,B)\ < ||A||||5|0, we have 




h~ 

sup 

^ij'^ljid)^^] 0^i3 ( 


xe5 

id 







h~ 


max Yf 


sup 

^ V ^ij'^ljid)^^] (( ^^S^fij)) 



id _ 


xes 

id 








h~ 

22/i-lg 

max Yf 
id ^ 

E 


id ^ 2 




E 

E 

E 


max Yf 

E 

sup 

^i3\{id)^^\ 

id 


xes 

id 




-I 

max Yf 

E 

sup \{iXnoE,zd 

id 


_xes 


max Yf 
id ^ 

E 

sup ||i;oAnf||^|| 
_xe5 


h' 


< 2^^“^ 



{ay/r A 1 )^ (^Vdid 2 ^ 


E 


max Yf 
id 


( 21 ) 


where E denotes the matrix with entries given by Cij, Aq 
denotes the indicator matrix for ft and o denotes the Hadamard 
product. 
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Similarly, the second term of can be bounded as follows: 


-t2h-l 


E 


<22ft-lg 


sup 

xes 




^,3 


<2 


2h- 


sup ii^oAoriixii: 

^ {a^/r)^ (^y/did2^ E[||EoAq||^ 


Plugging ( [2T] ) and (22) into (20), we have 


E 


sup \Fn,Y{X) -EFn,Y{X)f' 


<2 


,xe5 
2h- 




maxK/’ 


+ 1 


( 22 ) 


i(av^+l)'*(v/^) E[||EoAnf]- (23) 


To bound E [||E’ o Aj^||^], we can use the result from Q 
if we take h = \og{did 2 ) > 1: 

E[||^oAof] 

<C„ (2(1 + V6))'‘ (l ”’(<li + 'i2) + <hdMM, ) 


for some constant Cq. Therefore, the only term we need to 
bound is E [max^^j F-J]. 

From Lemmaif t > to, then for any G [di] x [d 2 ], 
the following inequality holds since to > a: 

^ {\^ij ~ ^ij\ ^ ^ (Fij ^ Xtij + t) + E (Yij < Mij — t) 

< exp(—t) + 0 
= nwij>t), 

(24) 

where Wij are independent standard exponential random 
variables. 


Below we use the fact that for any positive random variable 
X, we can write EX = ¥{X > t)dt^ allowing us to bound 


E 


maxF/: 




< 2 

= 2 

< 2 

< 2 

< 2 


2h-l I h 


+E 


nax|Fi_^- - M^j\ 




^ -E J E ^rnax \ Yij — Mij\^ > t^ dt^ 

(^0)"^+ [ Efmax|F,,--M,,f >t 
J(toF V 


2/.-1 I ^ 

Fto)^ 

a" + (fo)^ F J E ^rnaxlT^^ > t^ dt 

^ Q(^ + (fo)^+E 


2h-l I 


dt 


max IF,^- 
i,j ^ 


(25) 


Above, firstly we use triangle inequality and power mean 
inequality, then along with independence, we use ( [24| ) in the 
third inequality. By standard computations for exponential 
random variables. 


E 


max VL^- 
ij ^ 


< 2h\ + log^{did2). 


Thus, we have 


(26) 


E 


max YJ] 
id ^ 


< 22 '*-! (Y + {tof + 2h\ + \og^{did2)) . 

(27) 


Therefore, combining ([27]) and ([23]), we have 


E sup |Fn.y(X)-EFn,F(X)|^ 
.xes 

<24/1-1 ' - ' " ' ' ^ 


-^{aV^+Y(^^/XF| E[||EoAnf]. (28) 

\ h 

j + (to)^ + 2h\ + \og^{did2)^ . 


Then, 


E 


sup \Fn,Y{X) -EFn,Y{X)f 


xes 


<16 


{aV^Fl) (y^)E[||L;oAo||'^]^ 


<16 


—j (a + to + 2h + log{did 2 )) 
2 

(a{e^ -2) + 3log{did2)) 


(aV^+l) (v^)e [11^0 Anil 


hi h 


<128 (^1 + \Yj Cq" “ 2 ) + 3\og{did2)) 

(^\/m{di + d2) + did2 log{did2)^ . 


(29) 


where we use the fact that (a^+6^+c^+(i^)^/^ < a-\-bYcYd if 
a, 6, c, (i > 0 in the first inequality and we take h = \og{did 2 ) > 
1 in the second inequality. 


Moreover when C > 128 (l + v^fi) e. 


Co 


( 128(1 + ^/6) 

C 


\og(did2) 

< 


Co 

did2 


8 






































Therefore we can use Markov inequality to see that 


P 


=P 


<E 


sup \Fa,Y{X) -EFn,Y{X)\ 

{xes 

> C {a^/j3) (a(e^ - 2) + 31og((ii(i2)) 

{^■s/m{di + d2) + did2 log{did2)^ | 

fsup \Fn,Y{X)-EFa,Y{X)\’^ 

[xes 

> C {ay/r/0) (a(e^ - 2) + 31og((ii(i2)) 
{\/m{di +d2) + did2 \og{did2)^ | 


sup \Fn,Y{X) - EFn,Y{X)\^ 
xes 


{(C' {a^/r/P) [a{e^ - 2 ) + 3log(di(i 2 )) ■ 

[y/m{di +d2)+did2 log(did2))) } 

C 

d^2' 


1. For all X G X, each entry has \Xij \ = ay. 


2. For all e x, i ^ j, ||X(*) - > 

a‘^y‘^did2l2. 

Lemma 5. For x,y > 0, D{x\\y) < {y — /y. 


Proof: First assume x < y. Let z = y — x. Then ^ > 0 
and D{x\\x F z) = xlog + 2 ;. Taking the first derivative 
of this with respect to 2 ;, we have ■^D{x\\x z) = 

Thus, by Taylor’s theorem, there is some ^ G [0, z] so that 
D{x\\y) = D{x\\x)FZ' Since the right-hand-side increases 

in we may replace ^ with 2 ; and obtain D{x\\y) < . 

For X > y, with the similar argument we may conclude that 
for 2 ; = ^ — X < 0 there is some ^ G [ 2 ;, 0] so that D{x\\y) = 
D{x\\x) + 2 ; • Since 2 ; < 0 and + 0 increases in 
then the right-hand-side is decreasing in We may also replace 
^ with 2 : and this proves the lemma. ■ 


where C' > 128(1 + \/ 6 )e and C are absolute constants. 

■ 

Lemma 3. Let (3 < Mij^Mij < a,V(i,j) G [df x [d 2 ], then 

where T = — /3)^. 


AaT did2 


Proof of Theorem Lemma Lemma and Lemma 
are used in the proof. In the following, the expectation are 
taken with respect to both ft and {Yij}. First, note that 


Proof: Assuming x is any entry in M and y is any entry 
in M, then ^ < x^y < a and 0 < \x — y\ < a — /3. By the 
mean value theorem there exists an i{x^y) G [P^a] such that 

\{Vx-y/yf = ]:{^^^2=={x-y^ = < T. 


Fa,Y{X)-Fa,Y{M) = ^ 

Then for any X G 5, 

E[Fn,y(X)-Fn,y(M)] 

““ E 




did2 


^,j 




8C(a;,2/) 


The function f{z) = 1 — e ^ is concave in [ 0 ,+ 00 ], so if 
2 : G [0, T], we may bound it from below with a linear function 


m 

did‘2 

m 

did‘2 


E 


- {Xi^ - Mi^) 


(32) 


Y,D{Mij\\Xij) = -mD{M\\X). 


l-e~^ > 


1 — e 


-T 


(30) 


Plugging z = \[\fx-= 8 g(ly) {x-yY in (30), we have 


2 - 2 exp (^-i(Va; - > 

1 — 1 2 

> ——— z—ix — y) . 


1 — e ^ 1 

T 4.^{x,y) 


{x-yf 


For M ^ S, WQ know M E S and Fq y(M) > y(M). 

Thus we write 

0 < Fq^y{M) - Fq^y{M) 

= Fq^y{X[) + EF^^y (M) — EF^^y (M) 

+ EFo,y (M) — EF^^y (M) — FQ^y(M) 


< E 


Fn,Y{M) - Fn,Y{M) 


+ 


(31) 

Note that pT] ) holds for any x and y. This concludes the proof. 

■ 

Lemma 4. Let H = {M : ||M||^ < a^/rdhfdf^ Halloo E o:} 
and y <1 be such that is an integer. Suppose r/ 7 ^ < di, 
then we may construct a set x ^ H of size 

rd2 
167 ^ 


Fo,y(M) - EFo,y(M) + |Fo,y(M) - EFo,y(M)| 

< -mD{M\\M) + 2 sup |Fo,y(X) - EFo,y(X)| . 

Applying Lemma we obtain that with probability at least 

{l-C/{did2)), 

0 < —mD{M\\M) 

+ 2C' [a^/r/f3) (a{e‘^ - 2) F 3log{did2)) • 
(^^/m{dl + d2) + did2 log(did 2 )^ . 


Ixl > exp 

with the following properties: 
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After rearranging terms and applying the fact that s/did 2 < 



































(33) 


di + d 2 , we obtain 

D{M\\M) <2C' {a^/F/|3) - 2) + 31og((iid2)) ■ 

(^^ym{dl + d2) + (<ii + d2Y log(did2)) ■ 

Note that the KL divergence can be bounded below by the 
Hellinger distance (Chapter 3 in p7|): 

d]i{x,y) < D{x\\y). 

Thus from ( [^ , we obtain 

M) <2C' (a^/r/p) (a{e‘^ - 2) + 3 log(did 2 )) • 

(^^ym{dl + d2) + {di + d2Y log(did2)) 


(34) 


Finally, Theorem is proved by applying Lemma 


Proof of Theorem 

We will prove by contradiction. Lemma and Lemma are 
used in the proof. Without loss of generality, assume d 2 > di. 
Choose e > 0 such that 


e = mm 


I 256 V m I 


where C 2 is an absolute constant that will be be specified later. 
First, choose 7 such that ^ is an integer and 

4V2e 8e 1 

—- < 7 < — < 

a a 2 

We may make such a choice because 


o 

a^r 


o 

a^r 


r 

64e2 - 7 “ 32e2 


and 


2 

a^r 


2 

a^r 


2 

a^r 


32e2 64e2 646^ 


> 4a^r > 1. 


Furthermore, since we have assumed that is larger than 
Cro?jd\, rl^‘^ < di for an appropriate choice of C. Let 
7 the set defined in Lemmaby replacing a with a/2 
and with this choice of 7 . Then we can construct a packing set 
X of the same size as Xa /2 7 defining 

X ^ {x'+a (1 - I) Uxd. : X'e x:./2,4 • 

The distance between pairs of elements in x is bounded since 


J^did2 


> 4^1^26 . 


(35) 


Now consider an algorithm that for any X e S returns X 
such that 


1 


did2 


\\X-X\\%<e‘^ 


(36) 


with probability at least 1/4. Next, we will show this leas to 
an contradiction. Let 

X*=arg min \\X^^ - X\\l, 
xii)ex 

by the same argument as that in Q, we have X* = X as 
long as ( [ 3 ^ holds. Using the assumption that (36) holds with 
probability at least 1/4, we have 


P(X* T^X) < -. 


(37) 


Using a generalized Fano’s inequality for the KL divergence in 
1 , we have 


P(X* 7 ^ X) > 1 - 


max^w^;^(oZ)(XW||X(0) + l 

log Ixl 


(38) 


Define D 4 D(X«||XW) = ||xg7 We 

know that each term in the sum is either 0, D{a\\a'), or 
D{a'\\a). From Lemmasince a' < a, we have 

^ ^ ^ 64me^ 

“ a' ~ a' ‘ 

Combining ( [T7| ) and ( [38] ), we have that 

\<i-nx^x*)< -^4/ 

4 log Ixl 


64me^ 


< 167^ 


1 


64me^ 


rd2 


< 1024e^ 


+ 1 


(39) 


a‘^rd2 


Suppose 64me^ < a', then with (39), we have 
4 - 

which implies that a‘^rd 2 < 32. Then if we set Co > 32, this 
leads to a contradiction. Next, suppose 64me^ > a', then with 
we have 

128TOe2 


- < 1024e2 
4 

Since 1 — 7 > 1/2, we have 


(1 — j)a^rd2 


e" > 


a' 


3/2 


1024 



Setting C 2 < 1/4096, this leads to a contradiction. Therefore, 
([ 3 ^ must be incorrect with probability at least 3/4. This 
concludes our proof. 


Define a' = {1 — j)a, then every entry of X G x has Xij G 
{a, a'}. Since we have assumed r > 4, for every X G x» we 
have 

||X|U = ||X' + « (1 - 4 UxdJU < IIX'II* + a(l - 


< -y/rdid 2 + a\/did2 < a\/rdid2, 

for some X' G x'a /2 7 * the 7 we choose is less than 1/2, 
a' is greater than a/2. Therefore, from the assumption that 
f3 < a/2, we conclude that x C 5. 


Lemma 6. If f is a closed convex function satisfying Lipschitz 
condition then for any X, U G 5, the following inequality 
holds: 

f{Y) < /(X) + (V/(X), Y-X) + ^\\Y- Xfp. 
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Proof: Let Z = Y — X, then we have 


so we have by also taking t = IjL 


f{Y) = f{X) + (V/(X), Z) + f (V/(X + tV) - V/(X), Z) dt 

Jo 

<fiX) + {XfiX),Z)+ [ \\fiX + tV)-Vf{X)\\F\\Z\\F dt 
Jo 

<f{X) + (V/(X), Z)+ [ Lt\\Z\\% dt 
Jo 

=f{X) + (V/(X), Y-X) + ^\\Y- X\\%, 


g{Mk) - 9 {M) 

k—1 
^ i=0 

j k-i (44) 

E (ll^^ - Y?f - \\Mi+i - Mill) 

^ i=0 

^ L||Mo-M||| 

2k 


where we use Taylor expansion with integral remainder in the 
first line, the fact that dual norm of Frobenius norm is itself in 
the second line and Lipschitz condition in the third line. 


Finally, we proves the theorem by noticing that h{Mk) = 
h{M) = 0 for any k > 0. 


Proof of Theorem We will use some results in the 
above proof. Defining that Vq = Mq and for any /c > 1, 

- ^<^-1 + - - Mk - i ) ■ 

k^l Ok 

Setting t = IjL, then by noticing that 

= ^/c-1 — tGt{Zk-l), 


Proof of Theorem As is well known, proximal mapping 
of a E G 5 associated with a closed convex function h is given 
by 

= argimn (^t ■ h(X) + ^11^ - , 

where t > 0 is a multiplier. Define for each M G S that 

Gt{M) = I {M- prox,^ (M - iV/(M))), 
then we can know by the characterization of subgradient that 


Gt{M) - Vf{M) e dh{M), 


(40) 


where dh{M) is the subdifferential of h at M. Noticing that 
M — Gt{M) e S, then from Lemmawe have 

f{M-tGt{M)) < /(M)-(V/(M),fG*(M)) + |||Gt(M)|||, 

(41) 

for all 0 < t < l/L. In our case, h{M) = IsLM)' Defining 
g{M) = f{M) + h{M), combining (40) and (41) and using 
the fact that / and h are convex functions, wenave for any 

9{M - tGt{M)) < 9{Z) + {Gt{M), M - Z) - ^ ||Gt(M)|||. 

(42) 


Taking Z = M in ( |42| ), then we have for any k > 0 
5(Mfc+i) -5(M) <{Gt{Mk),Mk-M) - ^||Gt(Mfe)||| 
= 2(||M,-M|||-||Mfe+a-M|||) 


(43) 


where we use the fact that (M, M) = ||M|||.. By taking Z = 
Mk in (42) we know that f{Mk-\-i) < f{Mk) for any /c > 0, 


we can rewrite 14 as 




^k 


Taking Z = Mk-i and E 
combination we have 


= M in (42) 


and make convex 


g{^k) E(1 ~ CLk)g{^k-l) + CLkg{^) 

+ afc(Gt(Zfc_i),f4_i -M) - ^||Gt(Zfe_i)||| 

= (1 “ CLk)g{^k-l) + CLkdi^) 

+ § Mill-IlLfe-Mill). 

(45) 

Rearranging the terms before we have 

\{9{Mk)-9{M)) + Yvu-MfF< 

J-n 1 (46) 

— ^(5(Mfe_a) - 9{M)) + -||14_i - Mill. 

Qjk zr 

Noticing that (1 — a/c)/(a|) < for any /c > 1, we 

apply inequality ( [46| ) recursively to get 

T(g(Mfe)-fl(M)) + F||14-M||| < 1 ||Mo-M|||, (47) 

which proves the theorem. 
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